feat(rocev2): RoCEv2 RDMA-SEND datapath, drivers, and PRBS validation harness#15
feat(rocev2): RoCEv2 RDMA-SEND datapath, drivers, and PRBS validation harness#15ruck314 wants to merge 8 commits into
Conversation
Pulls in the surf-side RoCEv2 work that the rest of this branch builds on: - AXI-Stream RDMA datapath, engine refactor, and RoCEv2 rename - DCQCN congestion control with runtime bypass - regenerated blue-rdma transport engine for line-rate RDMA-SEND - runtime ECN/DSCP IPv4 header register and AxiStreamMon frameUpdate surf 634c8d9 -> dcc0155.
Instantiate the surf RoCEv2AxiStreamRdma engine in the shared firmware datapath and thread a single 64-bit RDMA AXI-Stream pair through the hierarchy: - CorePkg.vhd: define RDMA_AXIS_CONFIG_C (64-bit) as the shared RDMA stream config. - App.vhd: drive SsiPrbsTx on RDMA_AXIS_CONFIG_C and wrap the payload with AxiStreamPacketizer2 (CRC_MODE_G="NONE", MAX_PACKET_BYTES_G=4096) ahead of the RDMA SEND. - Core.vhd: pass the single RDMA master/slave pair through Core. - Rudp.vhd: replace the bare engine with the RoCEv2AxiStreamRdma wrapper; default it to point-to-point posture (DSCP_G=0, ECN_G="00" Not-ECT) so the host NIC keeps DCQCN disengaged unless configured at runtime.
Add the three RoCEv2 KCU105 build targets (1GbE, 10GbE, RJ45), each with its top-level HDL, ruckus.tcl, Makefile, and promgen.tcl, and the supporting build/release plumbing: - firmware/Makefile: top-level batch build/clean over all six targets. - shared/ruckus.tcl + Simple* targets: hoist the Vivado VersionCheck 2023.1 out of each target into the shared ruckus.tcl (single source). - releases.yaml: drop the Rogue packaging block and switch the release to FW_only (mcs/ltx) now that firmware ships independently of the driver. - shared_version.mk: bump firmware version to v3.0.0.0.
Add the rocev2_10gbe_rudp_kcu105_example PyRogue package (Root/App and RoCEv2 bring-up sequencer) and teach the shared Core driver about RoCEv2: - simple_10gbe_rudp_kcu105_example/_Core.py: add rocev2/dcqcn ctor knobs, add a UDP client (numClt=1) and instantiate surf RoCEv2AxiStreamRdma at 0x0015_0000 when rocev2 is enabled. - rocev2_10gbe_rudp_kcu105_example/_Root.py: host<->FPGA RoCEv2 bring-up and tear-down sequencing, plus a packetizer CoreV2 stage that strips the AxiStreamPacketizer2 framing ahead of PrbsRx. - rocev2_10gbe_rudp_kcu105_example/_App.py + __init__.py: App device tree and package exports.
Add the rocev2PrbsTest.py end-to-end PRBS validation harness for the RoCEv2 RDMA-SEND datapath: derives PacketLength from maxPayload minus the 16 B packetizer overhead, exposes --p2p / throttle / checkPayload knobs, and gates pass/fail on rxErrors and FW-egress bandwidth telemetry. Supporting environment updates: - updateBootProm.py: extend the post-FpgaReload settle from 5 s to 10 s. - setup_env_slac.sh: bump the conda env to rogue_v6.15.0.
|
Hi @ruck314 , This can happen running the script multiple times but even at first run after FPGA programming. Sometimes after the error the RoCEv2Engine goes into timeout trying the Maybe Im running the script wrong? |
Root.start() now unwinds through stop() if the host<->FPGA hand-off throws, so a failed bring-up cannot leak the transport/poll threads or a half-open QP. Root.stop() runs teardownConnection() then an Engine SoftReset inside a try/finally, forcing the FW RoceConfigurator back to IDLE (it has no response timeout, so an out-of-order DESTROY could otherwise wedge it) and guaranteeing the transport is always released. setP2pMode() is decoupled from the RNR timer: min_rnr_timer is QP state owned solely by transportCfg, so a slow softRoCE responder can keep a larger backoff while FW DCQCN is still bypassed. rocev2PrbsTest.py defaults --trigRate per device (5 kHz for softRoCE vs 25 kHz for an mlx5 NIC) and, under --p2p on softRoCE, keeps the configured --minRnrTimer instead of forcing the minimal code-1 backoff that storms RNR NAKs on a kernel responder. Adds FW diagnostic counters (Unsuccess/DmaRead/Oversize) to the result summary for one-shot failure classification. Validated on a softRoCE-only host: 25/25 PRBS runs pass with zero errors and no teardown segfault (paired with the rogue rocev2 Server teardown fix).
surf dcc0155 -> 142903a: RoCEv2 UdpEngine ECN/DSCP fix (move ECN/DSCP out of the localMac word bits) and AxiStream/UdpEngine PR-review fixes. ruckus 4c60fea -> 684ecc6: build-system updates (versal 2026.1, release compare-URL, docs Pages deploy).
|
Thanks @FilMarini — not you running it wrong; reproduced it and root-caused two real bugs: 1. Intermittent 2.
Side note on your other observation: on a host that also has a hard-RoCE NIC, make sure the IPv4-mapped GID index is used — rxe index 0 is the link-local Validated on a soft-RoCE-only host: 25/25 PRBS runs pass, zero errors, no segfault (with rogue#1276). The app-side fixes are pushed here; please re-test once #1276 merges. |
Summary
Adds end-to-end RoCEv2 RDMA-SEND support to the Simple-10GbE-RUDP-KCU105 example:
a hardware RDMA datapath built on the surf
RoCEv2AxiStreamRdmaengine, three newRoCEv2 build targets, a PyRogue device driver with host↔FPGA bring-up
sequencing, and the
rocev2PrbsTest.pyline-rate PRBS validation harness.The branch history is organized into five reviewable commits, one per feature:
chore(rocev2): bump surf— surf634c8d9 → dcc0155: AXI-Stream RDMAdatapath + engine refactor + RoCEv2 rename, DCQCN congestion control with
runtime bypass, regenerated blue-rdma transport engine, runtime ECN/DSCP IPv4
header register,
AxiStreamMonframeUpdate.feat(rocev2): wire RoCEv2 RDMA engine into shared RTL— instantiateRoCEv2AxiStreamRdmain the shared datapath; thread a single 64-bit RDMAAXI-Stream pair through
CorePkg/Core/App/Rudp; wrap theSsiPrbsTxpayload with
AxiStreamPacketizer2(CRC_MODE_G="NONE",MAX_PACKET_BYTES_G=4096) ahead of the RDMA SEND; default to point-to-pointposture (
DSCP_G=0,ECN_G="00"Not-ECT) so the host NIC keeps DCQCNdisengaged unless configured at runtime.
feat(rocev2): add RoCEv2 build targets + plumbing— three RoCEv2 KCU105targets (1GbE / 10GbE / RJ45) with HDL,
ruckus.tcl,Makefile, andpromgen.tcl; a top-levelfirmware/Makefilebatch build over all sixtargets; hoist the Vivado
VersionCheck 2023.1out of eachSimple*targetinto shared
ruckus.tcl; switchreleases.yamltoFW_only(mcs/ltx); bumpfirmware version to
v3.0.0.0.feat(rocev2): add RoCEv2 PyRogue device driver— newrocev2_10gbe_rudp_kcu105_examplepackage (Root/App+ host↔FPGAbring-up/tear-down sequencer, with a
packetizer.CoreV2depacketizer stageahead of
PrbsRx); teach the sharedCoredriver aboutrocev2/dcqcnknobs and instantiate
RoCEv2AxiStreamRdmaat0x0015_0000.feat(rocev2): add rocev2PrbsTest.py line-rate PRBS harness— end-to-endPRBS validation that derives
PacketLengthfrommaxPayloadminus the 16 Bpacketizer overhead;
--p2p/ throttle /checkPayloadknobs; pass/failgated on
rxErrorsand FW-egress bandwidth telemetry. PlusupdateBootProm.pypost-reload settle 5 s → 10 s andsetup_env_slac.shconda env →
rogue_v6.15.0.Test plan
cd docs && make html) succeeds locallyrocev2PrbsTest.py(throttled,checkPayload=True) → PASS (rxErrors=0),confirming the FW packetizer → RDMA SEND → SW
CoreV2depacketizer →PrbsRxchain reassembles and validates intact.